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Markowitz's celebrated mean-variance portfolio optimization the- 
ory assumes that the means and covariances of the underlying asset 
returns are known. In practice, they are unknown and have to be 
estimated from historical data. Plugging the estimates into the effi- 
cient frontier that assumes known parameters has led to portfolios 
that may perform poorly and have counter-intuitive asset allocation 
weights; this has been referred to as the "Markowitz optimization 
enigma." After reviewing different approaches in the literature to ad- 
dress these difficulties, we explain the root cause of the enigma and 
propose a new approach to resolve it. Not only is the new approach 
shown to provide substantial improvements over previous methods, 
but it also allows flexible modeling to incorporate dynamic features 
and fundamental analysis of the training sample of historical data, 
as illustrated in simulation and empirical studies. 

1. Introduction. The mean-variance (MV) portfolio optimization the- 
ory of Harry Markowitz (1952, 1959), Nobel laureate in economics, is widely 
regarded as one of the foundational theories in financial economics. It is 
a single-period theory on the choice of portfolio weights that provide the 
optimal tradeoff between the mean (as a measure of profit) and the variance 
(as a measure of risk) of the portfolio return for a future period. The theory, 
which will be briefly reviewed in the next paragraph, assumes that the means 
and covariances of the underlying asset returns are known. How to imple- 
ment the theory in practice when the means and covariances are unknown 
parameters has been an intriguing statistical problem in financial economics. 
This paper proposes a novel approach to resolve the long-standing problem 
and illustrates it with an empirical study using CRSP (the Center for Re- 
search in Security Prices of the University of Chicago) monthly stock price 
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data, which can be accessed via the Wharton Research Data Services at the 
University of Pennsylvania. 

For a portfoho consisting of m assets (e.g., stocks) with expected re- 
turns fii, let Wi be the weight of the portfolio's value invested in asset i 
such that 1 Wi = 1, and let w = [wi, . . . , Wm)'^ , M = (MIi ■ • • ; /"m)"^, 1 = 
(1, . . . , 1)^. The portfolio return has mean w^fi and variance w-'^Sw, whe- 
re 5] is the covariance matrix of the asset returns; see Lai and Xing (2008), 
pages 67, 69-71. Given a target value /i* for the mean return of a portfolio, 
Markowitz characterizes an efficient portfolio by its weight vector Wcfr that 
solves the optimization problem 

(1.1) Wcfj = argminw"^Sw subject to w^/x = /i*,w"^l = l,w > 0. 

w 

When short selling is allowed, the constraint w > (i.e., Wi>0 for all i) 
in (1.1) can be removed, yielding the following problem that has an explicit 
solution: 

Weff = arg min w Sw 

■w-.^w"^ IJ,=fJ,t., w^l=l 

(1.2) 

= - ylS^V + Ai*(CI]"V - A'E-^1)}/D, 

where A = /i^S^^l = l^S^V,^ = At^5]~V,C' = l^S^H, and D = 
BC-A^. 

Markowitz's theory assumes known /x and S. Since in practice n and S 
are unknown, a commonly used approach is to estimate n and S from his- 
torical data, under the assumption that returns are i.i.d. A standard model 
for the price Pa of the ith asset at time t in finance theory is geometric 
Brownian motion dPu / Pu = Oidt + cjj dP'^'^ , where {B^^ , i > 0} is standard 
Brownian motion. The discrete-time analog of this price process has returns 
Tit = {Pit - Pi,t-i)/Pi,t-i, and log returns log{Pit/ Pi^t-i) = log(l + ru) ^ 
that are i.i.d. N{9i — af/2,af). Under the standard model, maximum like- 
lihood estimates of // and I] are the sample mean /i and the sample co- 
variance matrix S, which are also method-of-moments estimates without 
the assumption of normality and when the i.i.d. assumption is replaced by 
weak stationarity (i.e., time-invariant means and covariances). It has been 
found, however, that replacing /x and S in (1.1) or (1.2) by their sample 
counterparts /I and S may perform poorly and a major direction in the 
literature is to find other (e.g., Bayes and shrinkage) estimators that yield 
better portfolios when they are plugged into (1.1) or (1.2). An alternative 
method, introduced by Michaud (1989) to tackle the "Markowitz optimiza- 
tion enigma," is to adjust the plug-in portfolio weights by incorporating 
sampling variability of ijl, S) via the bootstrap. Section 2 gives a brief sur- 
vey of these approaches. 
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Let Tt = (rit, . . . ,r„it)'^ ■ Since Markowitz's theory deals with portfoho re- 
turns in a future period, it is more appropriate to use the conditional mean 
and covariance matrix of the future returns r„+i given the historical data r„, 
r„„i, . . . based on a Bayesian model that forecasts the future from the avail- 
able data, rather than restricting to an i.i.d. model that relates the future to 
the past via the unknown parameters /x and S for future returns to be es- 
timated from past data. More importantly, this Bayesian formulation paves 
the way for a new approach that generalizes Markowitz's portfolio theory 
to the case where the means and covariances are unknown. When /x and S 
are estimated from data, their uncertainties should be incorporated into the 
risk; moreover, it is not possible to attain a target level of mean return as 
in Markowitz's constraint w^/x = since /x is unknown. To address this 
root cause of the Markowitz enigma, we introduce in Section 3 a Bayesian 
approach that assumes a prior distribution for (/x, S) and formulates mean- 
variance portfolio optimization as a stochastic optimization problem. This 
optimization problem reduces to that of Markowitz when the prior distribu- 
tion is degenerate. It uses the posterior distribution given current and past 
observations to incorporate the uncertainties of fi and S into the variance 
of the portfolio return w^r„+i, where w is based on the posterior distri- 
bution. The constraint in Markowitz's mean-variance formulation can be 
included in the objective function by using a Lagrange multiplier so 
that the optimization problem is to evaluate the weight vector w that max- 
imizes £'(w"^r„+i) — A Var(w"^r„+i), for which A can be regarded as a risk 
aversion coefficient. To compare with previous frequentist approaches that 
assume i.i.d. returns. Section 4 introduces a variant of the Bayes rule that 
uses bootstrap resampling to estimate the performance criterion nonpara- 
metrically. 

To apply this theory in practice, the investor has to figure out his/her 
risk aversion coefficient, which may be a difficult task. Markowitz's theory 
circumvents this by considering the efficient frontier, which is the (cr, ;u) 
curve of efficient portfolios as A varies over all possible values, where /i is 
the mean and o"^ the variance of the portfolio return. Investors, however, 
often prefer to use (/i — ^o)/'7e) called the information ratio, as a measure of 
a portfolio's performance, where /io is the expected return of a benchmark 
investment and is the variance of the portfolio's excess return over the 
benchmark portfolio; see Grinold and Kahn (2000), page 5. The benchmark 
investment can be a market portfolio (e.g., S&:P500) or some other reference 
portfolio, or a risk-free bank account with interest rate /io (in which case the 
information ratio is often called the Sharpe ratio). Note that the information 
ratio is proportional to /.f — /.io and inversely proportional to Ce, and can be 
regarded as the excess return per unit of risk. In Section 5 we describe how A 
can be chosen for the rule developed in Section 3 to maximize the information 
ratio. Other statistical issues that arise in practice are also considered in 
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Sections 5 and 6 where they lead to certain modifications of the basic rule. 
Among them are dimension reduction when m (number of assets) is not small 
relative to n (number of past periods in the training sample) and departures 
of the historical data from the working assumption of i.i.d. asset returns. 
Section 6 illustrates these methods in an empirical study in which the rule 
thus obtained is compared with other rules proposed in the literature. Some 
concluding remarks are given in Section 7. 

2. Using better estimates of /i, XI or Weff to implement Markowitz's 
portfolio optimization theory. Since /x and S in Markowitz's efficient fron- 
tier are actually unknown, a natural idea is to replace them by the sample 
mean vector /i and covariance matrix S of the training sample. However, 
this plug-in frontier is no longer optimal because ^ and XI actually dif- 
fer from /I and S, and Frankfurter, Phillips and Seagle (1976) and Jobson 
and Korkie (1980) have reported that portfolios associated with the plug- 
in frontier can perform worse than an equally weighted portfolio that is 
highly inefficient. Michaud (1989) comments that the minimum variance 
(MV) portfolio Weff based on /i and S has serious deficiencies, calling 
the MV optimizers "estimation-error maximizers." His argument is rein- 
forced by subsequent studies, for example. Best and Grauer (1991), Chopra, 
Hensel and Turner (1993), Canner et al. (1997), Simann (1997) and Britten- 
Jones (1999). Three approaches have emerged to address the difficulty during 
the past two decades. The first approach uses multifactor models to reduce 
the dimension in estimating S, and the second approach uses Bayes or other 
shrinkage estimates of S. Both approaches use improved estimates of 5] for 
the plug-in efficient frontier. They have also been modified to provide better 
estimates of /i, for example, in the quasi-Bayesian approach of Black and 
Litterman (1990). The third approach uses bootstrapping to correct for the 
bias of Wcfr as an estimate of Wcff . 

2.1. Multifactor pricing models. Multifactor pricing models relate the m 
asset returns r j to k factors fi, - ■ ■ ,fk in a regression model of the form 

(2.1) ri = a, + {fi,...,fkfp, + ei, 

in which Oj and /3j are unknown regression parameters and is an un- 
observed random disturbance that has mean and is uncorrelated with 
f := (/i, . . . , fk)'^. The case A; = 1 is called a single-factor (or single-index) 
model. Under Sharpe's (1964) capital asset pricing model (CAPM) which 
assumes, besides known fi and S, that the market has a risk- free asset with 
return r f (interest rate) and that all investors minimize the variance of their 
portfolios for their target mean returns, (2.1) holds with k = l, ai = rf and 
f = rnj — rf, where rM is the return of a hypothetical market portfolio M 
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which can be approximated in practice by an index fund such as Standard 
and Poor's (S&P) 500 Index. The arbitrage pricing theory (APT), intro- 
duced by Ross (1976), involves neither a market portfoho nor a risk-free 
asset and states that a multifactor model of the form (2.1) should hold 
approximately in the absence of arbitrage for sufficiently large m. The the- 
ory, however, does not specify the factors and their number. Methods for 
choosing factors in (2.1) can be broadly classified as economic and statis- 
tical, and commonly used statistical methods include factor analysis and 
principal component analysis; see Section 3.4 of Lai and Xing (2008). 

2.2. Bayes and shrinkage estimators. A popular conjugate family of prior 
distributions for estimation of covariance matrices from i.i.d. normal random 
vectors with mean /x and covariance matrix S is 

(2.2) /i|S~iV(i.,S/K), S~/W^„(*,no), 

where IWm{^ ,nQ) denotes the inverted Wishart distribution with tiq de- 
grees of freedom and mean ^/{hq — m — 1). The posterior distribution of 
(/X, I]) given (ri, . . . , r„) is also of the same form: 

/i|S ~ N{fl, S;/(n + k)), S ~ IWrn{{n + nQ-m- l)T.,n + no), 

where /2 and S are the Bayes estimators of /x and Xl given by 

^ K n _ 

At = — — + — — r, 

n + K, n + K 

^ riQ — m — 1 ^ 

n + nn— m— Inn — m — 1 

(2.3) 

+ A-j:(^t-r){r,-rf 

n-\-no — m — 1 n ^-^ 

n + K I 

Note that the Bayes estimator S adds to the MLE of 5] the covariance 
matrix K{r — u){r — v)'^ /{n + k), which accounts for the uncertainties due 
to replacing fi by f , besides shrinking this adjusted covariance matrix toward 
the prior mean ^/(ng — m — 1). 

Simply using f to estimate /i, Ledoit and Wolf (2003, 2004) propose to 
shrink the MLE of toward a structured covariance matrix, instead of 
using directly this Bayes estimator which requires specification of the hy- 
perparameters /i, k, tiq and ^. Their rationale is that whereas the MLE 
S = X^"=i(rt — f)(rt — r)"^/n has a large estimation error when m{m + l)/2 
is comparable with n, a structured covariance matrix F has much fewer 
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parameters that can be estimated with smaUer variances. They propose to 
estimate S by a convex combination of F and S: 

(2.4) S = (5F + (1 - (5)S, 

where 6 is an estimator of the optimal shrinkage constant 6 used to shrink 
the MLE toward the estimated structured covariance matrix F. Besides the 
covariance matrix F associated with a single-factor model, they also suggest 
using a constant correlation model for F in which all pairwise correlations are 
identical, and have found that it gives comparable performance in simulation 
and empirical studies. They advocate using this shrinkage estimate in lieu 
of S in implementing Markowitz's efficient frontier. 

The difficulty of estimating /x well enough for the plug-in portfolio to 
have reliable performance was pointed out by Black and Litterman (1990), 
who proposed the following pragmatic quasi-Bayesian approach to address 
this difficulty. Whereas Jorion (1986) had used earlier a shrinkage estimator 
similar to p, in (2.3), which can be viewed as shrinking a prior mean u to the 
sample mean f (instead of the other way around). Black and Litterman's 
approach basically amounted to shrinking an investor's subjective estimate 
of fi to the market's estimate implied by an "equilibrium portfolio." The 
investor's subjective guess of n is described in terms of "views" on lin- 
ear combinations of asset returns, which can be based on past observations 
and the investor's personal/expert opinions. These views are represented by 
P/i ~ A^(q, n), where P is a p x m matrix of the investor's "picks" of the 
assets to express the guesses, and is a diagonal matrix that expresses the 
investor's uncertainties in the views via their variances. The "equilibrium 
portfolio," denoted by w, is based on a normative theory of an equilibrium 
market, in which w is assumed to solve the mean-variance optimization 
problem maxw(w-^7r — Aw-'^Sw), with A being the average risk-aversion 
level of the market and tt representing the market's view of fi. This the- 
ory yields the relation tt = 2ASw, which can be used to infer tt from the 
market capitalization or benchmark portfolio as a surrogate of w. Incorpo- 
rating uncertainty in the market's view of fi, Black and Litterman assume 
that TT — /2 ~ N{0,t'S), in which r G (0, 1) is a small parameter, and also 
set exogenously A = 1.2; see Meucci (2010). Combining P/x~ A^(q, $7) with 
TT — /X ~ N{0,t'S) under a working independence assumption between the 
two multivariate normal distributions yields the Black-Litterman estimate 
of fi: 

(2.5) ^^^ = [(rS)-i + P^n^^P]-i[(rS)-V + P^n-iq], 

with covariance matrix [(t5])~^ + P-^fi^^P]"^. Various modifications and 
extensions of their idea have been proposed; see Meucci (2005), pages 426- 
437, Fabozzi et al. (2007), pages 232-253, and Meucci (2010). These exten- 
sions have the basic form (2.5) or some variant thereof, and differ mainly 
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in the normative model used to generate an equilibrium portfolio. Note 
that (2.5) involves S, which Black and Litterman estimated by using the 
sample covariance matrix of historical data, and that their focus was to ad- 
dress the estimation of n for the plug-in portfolio. Clearly Bayes or shrinkage 
estimates of 5] can be used instead. 

2.3. Bootstrapping and the resampled frontier. To adjust for the bias 
of Weff as an estimate of Weg , Michaud (1989) uses the average of the boot- 
strap weight vectors: 



where is the estimated optimal portfolio weight vector based on the 6th 
bootstrap sample {r^j^, . . . ,r^^} drawn with replacement from the observed 
sample {ri, . . . ,r„}. Specifically, the 6th bootstrap sample has sample mean 
vector /^j, and covariance matrix 5]^ , which can be used to replace fi and I] 
in (1.1) or (1.2), thereby yielding w^. Thus, the resampled efficient frontier 

corresponds to plotting w"^/x versus \/ w^Sw for a fine grid of /i* values, 
where w is defined by (2.6) in which depends on the target level //^. 

3. A stochastic optimization approach. The Bayesian and shrinkage meth- 
ods in Section 2.2 focus primarily on Bayes estimates of fi and S (with 
normal and inverted Wishart priors) and shrinkage estimators of Xl. How- 
ever, the construction of efficient portfolios when fi and S are unknown is 
more complicated than trying to estimate them as well as possible and then 
plugging the estimates into (1.1) or (1.2). Note in this connection that (1.2) 
involves instead of I] and that estimating S as well as possible does 
not imply that is reliably estimated. Estimation of a high-dimensional 
mxm covariance matrix and its inverse when ni^ is not small compared to n 
has been recognized as a difficult statistical problem and attracted much re- 
cent attention; see, for example, Ledoit and Wolf (2004), Huang et al. (2006), 
Bickel and Lavina (2008) and Fan, Fan and Lv (2008). Some sparsity con- 
dition or a low-dimensional factor structure is needed to obtain an estimate 
which is close to S and whose inverse is close to but the conjugate 

prior family (2.2) that motivates the (linear) shrinkage estimators (2.3) or 
(2.4) does not reflect such sparsity. For high-dimensional weight vectors Wefr, 
direct application of the bootstrap for bias correction is also problematic. 

A major difficulty with the "plug-in" efficient frontier (which uses S to 
estimate S and f to estimate fi), its variants that estimate S by (2.4) 
and fj, by (2.3) or the Black-Litterman method, and its "resampled" version 
is that Markowitz's idea of using the variance of w-^r„_(_i as a measure of the 
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portfolio's risk cannot be captured simply by the plug-in estimates w Sw 
of Var(w'^r„_|_i) and w'^/I of £'(w'^r„,_|_i). This difficulty was recognized by 
Broadie (1993), who used the terms true frontier and estimated frontier to 
refer to Markowitz's efficient frontier (with known and XI) and the plug- 
in efficient frontier, respectively, and who also suggested considering the 
actual mean and variance of the return of an estimated frontier portfolio. 
Whereas the problem of minimizing Var(w-^r„_|_i) subject to a given level /x* 
of the mean return £^(w^r„_|_i) is meaningful in Markowitz's framework, in 
which both i?(r„+i) and Cov(r„_|_i) are known, the surrogate problem of 
minimizing w-^Sw under the constraint w^Ji = ignores the fact that 
both /X and XI have inherent errors (risks) themselves. In this section we 
consider the more fundamental problem 

(3.1) max{£'(w'^r„+i) - A Var(w^r„+i)} 

when jji and XI are unknown and treated as state variables whose uncer- 
tainties are specified by their posterior distributions given the observations 
ri,...,r„ in a Bayesian framework. The weights w in (3.1) are random 
vectors that depend on . . . , . Note that if the prior distribution puts 
all its mass at (/2q,So), then the minimization problem (3.1) reduces to 
Markowitz's portfolio optimization problem that assumes //q and XIq are 
given. The Lagrange multiplier A in (3.1) can be regarded as the investor's 
risk-aversion index when variance is used to measure risk. 

3.1. Solution of the optimization problem (3.1). The problem (3.1) is not 
a standard stochastic optimization problem because of the term [£'(w"^r„+i)]^ 
in Var(w-^r.„_|_i) = £^[(w-^r„+i)^] — [£'(w-^r.„_|_i)]^. A standard stochastic op- 
timization problem in the Bayesian setting is of the form maxag_4 Eg(X., 0, a), 
in which g(X., 6, a) is the reward when action a is taken, X is a random vector 
with distribution Fg , has a prior distribution and the maximization is over 
the action space A. The key to its solution is the law of conditional expec- 
tations Eg(X.,6,a) = E{E[g(X.,6,a)\^]}, which implies that the stochastic 
optimization problem can be solved by choosing a to maximize the posterior 
reward E[g{'X.,6,a)\'K]}. This key idea, however, cannot be applied to the 
problem of maximizing or minimizing nonlinear functions of Eg(X.,6,a), 
such as [Eg{'X.,6,a)]'^ that is involved in (3.1). 

Our method of solving (3.1) is to convert it to a standard stochastic 
control problem by using an additional parameter. Let W = w^r„_|_i and 
note that E{W) - AVar(VF) = hiEW^EW"^), where h{x,y) = x + Xx^ - Xy. 
Let Wb = w|jrn,+i and rj = 1 + 2XE{Wb), where is the Bayes weight 
vector. Then 

> h{EW, EW^) - h{EWB,EWl) 

= E{W) - E{Wb) - X{E{W'^) - E{Wl)] + X{{EWf - {EWb?} 
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= r]{E{W) - E{Wb)} + HE{W^) - E{W^)} + X{E{W) - E{Wb)}^ 

> {XE{W]^) - r]E{WB)} - {XE{W'^) - r]E{W)}. 

Moreover, the last inequality is strict unless EW = EWb, in which case the 
first inequality is strict unless EW"^ = EW"^ . This shows that the last term 
above is <0 or, equivalently, 

(3.2) XE{W'^) - r]E{W) > A^(W|) - vE{Wb), 

and that equality holds in (3.2) if and only if W has the same mean and 
variance as Wb- Hence, the stochastic optimization problem (3.1) is equiv- 
alent to minimizing Ai?[(w'^r„,_|_i)^] — r]E{'w'^rn+i) over weight vectors w 
that can depend on ri, . . . , r.„. Since rj = 1 + 2XE{Wb) is a linear function 
of the solution of (3.1), we cannot apply this equivalence directly to the 
unknown rj. Instead we solve a family of standard stochastic optimization 
problems over rj and then choose the r] that maximizes the reward in (3.1). 

To summarize, we can solve (3.1) by rewriting it as the following maxi- 
mization problem over rj: 

(3.3) max{£;[w^(77)r„+i] - A Var[w^(?7)r„+i]}, 

V 

where w(r/) is the solution of the stochastic optimization problem 
w{r]) = argmin{AE'[(w^r„+i)^] - r]E{w'^rn+i)}. 

w 

3.2. Computation of the optimal weight vector. Let /i„ and V„ be the pos- 
terior mean and second moment matrix given the set TZn of current and past 
returns ri, . . . ,r„. Since w is based on TZn, it follows from E'(r„+i|7^„) = /i„ 
and E{rn+irn+i\TZn) = V„ that 

(3.4) E{w^rn+i) = ^(w^/x„), ^[(w^r„+i)2] = E(w^V„w). 

Without short selling, the weight vector w{r]) in (3.3) is given by the fol- 
lowing analog of (1.1): 

(3.5) w(r/) = arg min {Aw"^V„w — rjw^ ^1^}^ 

w:w^l = l,w>0 

which can be computed by quadratic programming (e.g., by quadprog in 
MATLAB). When short selling is allowed but there are limits on short setting, 
the constraint w > can be replaced by w > wq, where wq is a vector of 
negative numbers. When there is no limit on short selling, the constraint 
w > in (3.5) can be removed and w(r7) in (3.3) is given explicitly by 

w(7/) = arg min {Aw"^V,jW — ?7W"^/x„} 

w;w^l = l 
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where the second equahty can be derived by using a Lagrange multipher and 

(3.7) An = = l^V-Vn, Bn = /X^V-V„, Cn = l^^n^L 

Quadratic programming can be used to compute w(?/) for more general hnear 
and quadratic constraints than those in (3.5); see Fabozzi et al. (2007), pages 
88-92. 

Note that (3.5) or (3.6) essentiahy plugs the Bayes estimates of jj, and 
V := S + /J/X"^ into the optimal weight vector that assumes and XI to be 
known. However, unlike the "plug-in" efficient frontier described in the first 
paragraph of Section 2, we have first transformed the original mean- variance 
portfolio optimization problem into a "mean versus second moment" opti- 
mization problem that has an additional parameter r/. Putting (3.5) or (3.6) 
into 

(3.8) C{r,) := i?[w'^(??)Mj + A(i^[w^(7?)/xJ)2 - Ai?[w^(7?)V„w(r?)], 

which is equal to E['w'^ {r])r] — A Var[w^(r/)r] by (3.4), we can use Brent's 
method [Press et al. (1992), pages 359-362] to maximize C{7]). It should be 
noted that this argument implicitly assumes that the maximum of (3.1) is 
attained by some w and is finite. Whereas this assumption is satisfied when 
there are limits on short selling as in (3.5), it may not hold when there is 
no limit on short selling. In fact, the explicit formula of w(7/) in (3.6) can 
be used to express (3.8) as a quadratic function of t]: 



which has a maximum only if 

In the case E{{Bn — ■^){Bn — -^r — 1)} > 0, C{rj) has a minimum instead 
and approaches to oo as \r]\ — )• oo. In this case, (3.1) has an infinite value 
and should be defined as a supremum (which is not attained) instead of 
a maximum. 

Remark. Let I]„ denote the posterior covariance matrix given TZn- Note 
that the law of iterated conditional expectations, from which (3.4) follows, 
has the following analog for Var(VF): 

Var(VF) = ^[Var(VF|7^„)] + Var[^;(Ty|7^„)] 

(3.10) 

= S(w'^5]„w) + Var(w^/x„). 
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Using S„ to replace S in the optimal weight vector that assumes /x and S 
to be known, therefore, ignores the variance of w'^/x^ in (3.10), and this 
omission is an important root cause for the Markowitz optimization enigma 
related to "plug-in" efRcient frontiers. 

4. Empirical Bayes, bootstrap approximation and frequentist risk. For 

more flexible modeling, one can allow the prior distribution in the preced- 
ing Bayesian approach to include unspecified hyperparameters, which can 
be estimated from the training sample by maximum likelihood, or method 
of moments or other methods. For example, for the conjugate prior (2.2), 
we can assume u and ^ to be functions of certain hyperparameters that 
are associated with a multifactor model of the type (2.1). This amounts to 
using an empirical Bayes model for (/x, S) in the stochastic optimization 
problem (3.1). Besides a prior distribution for (/2, S), (3.1) also requires 
specification of the common distribution of the i.i.d. returns to evaluate 
-E'/i,s(w"^r„+i) and Var^^x;(w"^r„+i). The bootstrap provides a nonpara- 
metric method to evaluate these quantities, as described below. 

4.1. Bootstrap estimate of performance. To begin with, note that we 
can evaluate the frequentist performance of asset allocation rules by making 
use of the bootstrap method. The bootstrap samples {r^-|^, . . . , r^^} drawn 
with replacement from the observed sample {ri,...,r„}, 1 <b < B, can be 
used to estimate its i?^,s(w^r„+i) = £'^,s(w^/x) and Var^^s(w^r„,+i) = 
-£'/i,s(w^Sw„) + Yav fj^^si'Wn t^) of various portfolios 11 whose weight vec- 
tors w„ may depend on F]^ , . . . ,r^. In particular, we can use Bayes or other 
estimators for /x„ and V„ in (3.5) or (3.6) and then choose rj to maximize 
the bootstrap estimate of Efj_^^{-w'^rn+i) — A Var^^s(w^r„+i). This is tan- 
tamount to using the empirical distribution of ri, . . . , r„ to be the common 
distribution of the returns. In particular, using f for /x„ in (3.5) and the 
second moment matrix X^tLi ^tT^T '^n in (3.6) provides a "nonpara- 
metric empirical Bayes" variant, abbreviated by NPEB hereafter, of the 
optimal rule in Section 3. 

4.2. A simulation study of Bayes and frequentist rewards. The follow- 
ing simulation study assumes i.i.d. annual returns (in %) of m = 4 as- 
sets whose mean vector and covariance matrix are generated from the nor- 
mal and inverted Wishart prior distribution (2.2) with k = 5, tiq = 10, 
u = (2.48,2.17,1.61,3.42)-^ and the hyperparameter ^ given by 



*11 


= 3.37, 


*22 


= 4.22, 


*33 = 2.75, 


*44 = 8.43, 


*12 


= 2.04, 










*13 


= 0.32, 


*14 


= 1.59, 


*23 = -0.05, 




*24 


= 3.02, 


*34 


= 1.08. 
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Table 1 

Rewards of four portfolios formed from m = 4 assets 



A 




Bayes 




Plug-in 




Oracle 




NPEB 




1 


Baycs 


0.0324 


(2.47e- 


-5) 


0.0317 


(2.55e- 


'5) 


0.0328 (2.27e 


-5) 


0.0324 


(2.01e- 


-5) 




Freq 1 


0.0332 


(2.61e- 


-6) 


0.0324 


(5.62e- 


-6) 


0.0332 




0.0332 


(2.56e- 


-6) 




Freq 2 


0.0293 


(7.23e- 


-6) 


0.0282 


(5.32e- 


-6) 


0.0298 




0.0293 


(7.12e- 


-6) 




Freq 3 


0.0267 


(4.54e- 


-6) 


0.0257 


(5.57e- 


-6) 


0.0268 




0.0267 


(4.73e- 


-6) 


5 


Bayes 


0.0262 


(2.33e- 


-5) 


0.0189 


(1.21e- 


-5) 


0.0267 (2.02e 


-5) 


0.0262 


(1.89e- 


-5) 




Freq 1 


0.0272 


(4.06e- 


-6) 


0.0182 


(5.54e- 


-6) 


0.0273 




0.0272 


(2.60e- 


-6) 




Freq 2 


0.0233 


(9.35e- 


-6) 


0.0183 


(3.88e- 


-6) 


0.0240 




0.0234 


(1.03e- 


-5) 




Freq 3 


0.0235 


(5.25e- 


-6) 


0.0159 


(2.88e- 


-6) 


0.0237 




0.0235 


(5.27e- 


-6) 


10 


Bayes 


0.0184 


(2.54e- 


-5) 


0.0067 


(7.16e- 


-6) 


0.0190 (2.08e 


-5) 


0.0183 


(2.23e- 


-5) 




Freq 1 


0.0197 


(7.95e- 


-6) 


0.0063 


(3.63e- 


-6) 


0.0199 




0.0198 


(4.19e- 


-6) 




Freq 2 


0.0157 


(l.OSe- 


-5) 


0.0072 


(3.00e- 


-6) 


0.0168 




0.0159 


(1.13e- 


-5) 




Freq 3 


0.0195 


(6.59e- 


-6) 


0.0083 


(1.62e- 


-6) 


0.0198 




0.0196 


(5.95e- 


-6) 



We consider four scenarios for the case n = 6 without short sehing. The first 
scenario assumes this prior distribution and studies the Bayesian reward for 
A = 1, 5 and 10. The other scenarios consider the frequentist reward at three 
values of (/x, generated from the prior distribution. These values, denoted 
by Freq 1, Freq 2, Freq 3, are as follows: 

Freq 1: fi = (2.42,1.88,1.58,3.47)^, En = 1.17,5:22 = 0.82, S33 = 1.37, 
S;44 = 2.86, S12 = 0.79, Si3 = 0.84, Em = 1.61, ^123 = 0.61, 1]24 = 1-23, S34 = 
1.35. 

Freq 2: fx = (2.59,2.29,1.25,3.13)^, En = 1.32, S22 = 0.67,^33 = 1.43, 
E44 = 1.03, E12 = 0.75, Si3 = 0.85, E14 = 0.68, E23 = 0.32, E24 = 0.44, S34 = 
0.61. 

Freq 3: /x = (1.91, 1.58, 1.03, 2.76)^ , En = 1.00, S22 = 0.83, E33 = 0.35, 
E44 = 0.62, E12 = 0.73, Si3 = 0.26, E14 = 0.36, E23 = 0.16, E24 = 0.50, S34 = 
0.14. 

Table 1 compares the Bayes rule that maximizes (3.1), called "Bayes" 
hereafter, with three other rules: (a) the "oracle" rule that assumes fj, and E 
to be known, (b) the plug-in rule that replaces /x and E by the sample esti- 
mates of /X and E, and (c) the NPEB (nonparametric empirical Bayes) rule 
described in Section 4.1 Note that although both (b) and (c) use the sample 
mean vector and sample covariance (or second moment) matrix, (b) simply 
plugs the sample estimates into the oracle rule while (c) uses the empirical 
distribution to replace the common distribution of the returns in the Bayes 
rule. For the plug-in rule, the quadratic programming procedure may have 
numerical difficulties if the sample covariance matrix is nearly singular. If 
it should happen, we use the default option of adding 0.0051 to the sample 
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3.6 




1.8' ' ' 1 ' ' ' 1 ' 1 1 

0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 

a 

Fig. 1. (c, m) curves of different portfolios. 

covariance matrix. Each result in Table 1 is based on 500 simulations, and 
the standard errors are given in parentheses. In each scenario, the reward of 
the NPEB rule is close to that of the Bayes rule and somewhat smaller than 
that of the oracle rule. The plug-in rule has substantially smaller rewards, 
especially for larger values of A. 

4.3. Comparison of the (cr, /i) plots of different portfolios. The set of 
points in the (a, /i) plane that correspond to the returns of portfolios of the 
m assets is called the feasible region. As A varies over (0, oo), the (cr, ^) values 
of the oracle rule correspond to Markowitz's efficient frontier which assumes 
known /i and S and which is the upper left boundary of the feasible region. 
For portfolios whose weights do not assume knowledge of /x and I], the 
(cj, ;u) values lie on the right of Markowitz's efficient frontier. Figure 1 plots 
the (cj, /i) values of different portfolios formed from m = 4 assets without 
short selling and a training sample of size n = 6 when (/^, S) is given by the 
frequentist scenario Freq 1 above. Markowitz's efficient frontier is computed 
analytically by varying ^u* in (1.1) over a grid of values. The (cr, /x) curves 
of the plug-in, covariance-shrinkage [Ledoit and Wolf (2004)] and Michaud's 
resampled portfolios are computed by Monte Carlo, using 500 simulated 
paths, for each value of /i* in a grid ranging from 2.0 to 3.47. The (a, ^) curve 
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of the NPEB portfolio is also obtained by Monte Carlo simulations with 500 
runs, by using different values of A > in a grid. This curve is relatively 
close to Markowitz's efficient frontier among the (cr, /i) curves of various 
portfolios that do not assume knowledge of /x and S, as shown in Figure 1. 
For the covariance-shrinkage portfolio, we use a constant correlation model 
for F in (2.4), which can be implemented by their software available at 
www.ledoit.net. Note that Markowitz's efficient frontier has values ranging 
from 2.0 to 3.47, which is the largest component of fi in Freq 1. The (o", /i) 
curve of NPEB lies below the efficient frontier, and further below are the 
(a,//) curves of Michaud's, covariance-shrinkage and plug-in portfolios, in 
decreasing order. These (cr, /i) curves are what Broadie (1993) calls the actual 
frontiers. 

The highest values 3.22, 3.22 and 3.16 of fj, for the plug-in, covariance- 
shrinkage and Michaud's portfolios in Figure 1 are attained with a target 
value = 3.47, and the corresponding values of a are 1.54, 1.54 and 3.16, 
respectively. Note that without short selling, the constraint w-^/x = fi^ used 
in these portfolios cannot hold if maxi<j<4/2j < We therefore need a de- 
fault option, such as replacing fi^ by min(/i^,, maxi<j<4 /Ij), to implement the 
optimization procedures for these portfolios. In contrast, the NPEB portfo- 
lio can always be implemented for any given value of A. In particular, for 
A = 0.001, the NPEB portfolio has fi = 3.470 and a = 1.691. 

5. Connecting theory to practice. While Section 4 has considered prac- 
tical implementation of the theory in Section 3, we develop the methodology 
further in this section to connect the basic theory to practice. 

5.1. The information ratios and choice of X. As pointed out in Section 1, 
the A in Section 3 is related to how risk-averse one is when one tries to max- 
imize the expected utility of a portfolio. It represents a penalty on the risk 
that is measured by the variance of the portfolio's return. In practice, it may 
be difficult to specify an investor's risk aversion parameter A that is needed in 
the theory in Section 3.1. A commonly used performance measure of a port- 
folio's performance is the information ratio (^ — ^o)/ce) which is the excess 
return per unit of risk; the excess is measured by /x — /xq, where fiQ = E{ro), tq 
is the return of the benchmark investment and dg is the variance of the excess 
return. We can regard A as a tuning parameter, and choose it to maximize 
the information ratio by modifying the NPEB procedure in Section 3.2, 
where the bootstrap estimate of £'^^s[w^(?/)r] — A Var^^s[w"'"(r/)r] is used 
to find the portfolio weight w;^ that solves the optimization problem (3.3). 
Specifically, we use the bootstrap estimate of the information ratio 



of wx, and maximize the estimated information ratios over A in a grid that 
will be illustrated in Section 6. 



(5.1) 
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5.2. Dimension reduction when m is not small relative to n . Another sta- 
tistical issue encountered in practice is the large number m of assets relative 
to the number n of past periods in the training sample, making it difficult 
to estimate and S satisfactorily. Using factor models that are related to 
domain knowledge as in Section 2.1 helps reduce the number of parameters 
to be estimated in an empirical Bayes approach. 

An obvious way of dimension reduction when there is no short selling is to 
exclude assets with markedly inferior information ratios from consideration. 
The only potential advantage of including them in the portfolio is that they 
may be able to reduce the portfolio variance if they are negatively correlated 
with the "superior" assets. However, since the correlations are unknown, 
such advantage is unlikely when they are not estimated well enough. Suppose 
we include in the simulation study of Section 4.2 two more assets so that 
all asset returns are jointly normal. The additional hyperparameters of the 
normal and inverted Wishart prior distribution (2.2) are = —0.014, uq = 
-0.064, *55 = 2.02, *66 = 10.32, *56 = 0.90, *i5 = -0.17, *25 = -0.03, 
*35 = -0.91, *45 = -0.33, *i6 = -3.40, *26 = -3.99, *36 = -0.08 and 
^^46 = —3.58. As in Section 4.2, we consider four scenarios for the case of n = 
8 without short selling, the first of which assumes this prior distribution and 
studies the Bayesian reward for A = 1, 5 and 10. Table 2 shows the rewards 
for the four rules in Section 4.2, and each result is based on 500 simulations. 
Note that the value of the reward function does not show significant change 
with the inclusion of two additional stocks, which have negative correlations 
with the four stocks in Section 4.2 but have low information ratios. This 
shows that excluding stocks with markedly inferior information ratios when 
there is no short selling can reduce m substantially in practice. In Section 6 
we describe another way of choosing stocks from a universe of available 
stocks to reduce m. 



Table 2 

Rewards of four portfolios formed from m — 6 assets 



A 




Bayes 




Plug-in 




Oracle 




NPEB 




1 


Bayes 


0.0325 


(2.55e- 


-5) 


0.0318 


(2.62e 


-6) 


0.0331 


(2.42e- 


-5) 


0.0325 


(2.53e- 


-5) 




Freq 1 


0.0284 


(1.59e- 


5) 


0.0277 


(1.31e 


-5) 


0.0296 






0.0285 


(1.62e- 


-5) 




Freq 2 


0.0292 


(8.30e- 


-6) 


0.0280 


(7.95e 


-6) 


0.0296 






0.0292 


(8.29e- 


-6) 




Freq 3 


0.0283 


(l.OOe- 


-5) 


0.0282 


(9.11e 


-6) 


0.0300 






0.0283 


(1.05e- 


-5) 


5 


Bayes 


0.0255 


(2.46e- 


-5) 


0.0183 


(1.44e 


-5) 


0.0263 


(2.05e- 




0.0254 


(2.45e- 


-5) 




Freq 1 


0.0236 


(1.99e- 


-5) 


0.0149 


(6.48e 


-6) 


0.0250 






0.0237 


(2.17e- 


-5) 




Freq 2 


0.0241 


(9.34e- 


-6) 


0.0166 


(3.61e 


-6) 


0.0246 






0.0243 


(8.95e- 


-6) 




Freq 3 


0.0189 


(2.09e- 


-5) 


0.0138 


(1.45e 


-5) 


0.0219 






0.0208 


(2.32e- 


-5) 


10 


Bayes 


0.0171 


(2.63e- 


-5) 


0.0039 


(1.57e 


-5) 


0.0180 


(2.20e- 


-5) 


0.0171 


(2.72e- 


-5) 




Freq 1 


0.0174 


(2.06e- 


'5) 


0.0042 


(5.19e 


-6) 


0.0193 






0.0177 


(2.42e- 


-5) 




Freq 2 


0.0177 


(1.12e- 


'5) 


0.0052 


(6.34e 


-6) 


0.0184 






0.0180 


(l.lOe- 


-5) 




Freq 3 


0.0089 


(2.79e- 


'5) 


0.0024 


(1.33e 


-5) 


0.0120 






0.0094 


(4.65e- 


-5) 



16 



T. L. LAI, H. XING AND Z. CHEN 



5.3. Extension to time series models of returns. An important assump- 
tion in the modification of Markowitz's theory in Section 3.2 is that are 
i.i.d. with mean fi and covariance matrix Xl. Diagnostic checks of the ex- 
tent to which this assumption is violated should be carried out in practice. 
The stochastic optimization theory in Section 3.1 does not actually need 
this assumption and only requires the posterior mean and second moment 
matrix of the return vector for the next period in (3.4). Therefore, one can 
modify the "working i.i.d. model" accordingly when the diagnostic checks 
reveal such modifications are needed. 

A simple method to introduce such modification is to use a stochastic 
regression model of the form 

(5.2) rit = Pjxi^t-i + eu, 

where the components of Xi^t-i include 1, factor variables such as the re- 
turn of a market portfolio like S&:P500 at time t — 1, and lagged variables 
ri^t_i, rj^t_2, . . . . The basic idea underlying (5.2) is to introduce covariates 
(including lagged variables to account for time series effects) so that the er- 
rors Cit can be regarded as i.i.d., as in the working i.i.d. model. The regression 
parameter /3j can be estimated by the method of moments, which is equiv- 
alent to least squares. We can also include heteroskedasticity by assuming 
that en = Si^t-i{li)zit, where zn are i.i.d. with mean and variance 1, 7^ 
is a parameter vector which can be estimated by maximum likelihood or 
generalized method of moments, and Si^t-i is a given function that depends 
on ri^t~i,ri,t-2, • • • • A well-known example is the GARCH(1, 1) model 

(5.3) €it = Si^t-iZit, s-,t_i =uji + aislt_2 + birl^_^ 

for which 7^ = (cjj, a^, 6j). 

Consider the stochastic regression model (5.2). As noted in Section 3.2, 
a key ingredient in the optimal weight vector that solves the optimization 
problem (3.1) is (/x„,V„), where = £;(^„+l|7^„) and V„ = £^(r„+ir^_^_i| 
TZn)- Instead of the classical model of i.i.d. returns, one can combine domain 
knowledge of the m assets with time series modeling to obtain better pre- 
dictors of future returns via /x„ and V„. The regressors Xj^(_i in (5.2) can 
be chosen to build a combined substantive-empirical model for prediction; 
see Section 7.5 of Lai and Xing (2008). Since the model (5.2) is intended to 
produce i.i.d. et = (e^, . . . , e^t)^, or i.i.d. = {zu, ■ • • , z^tf after adjust- 
ing for conditional heteroskedasticity as in (5.3), we can still use the NPEB 
approach to determine the optimal weight vector, bootstrapping from the 
estimated common distribution of et (or z^). Note that (5.2) and (5.3) mod- 
els the asset returns separately, instead of jointly in a multivariate regression 
or multivariate GARCH model which has too many parameters to estimate. 
While the vectors et (or zj) are assumed to be i.i.d., (5.2) [or (5.3)] does not 
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assume their components to be uncorrelated since it treats the components 
separately rather than jointly. The conditional cross-sectional covariance be- 
tween the returns of assets i and j given TZn is given by 

(5.4) Cov{ri^n+l,rj,n+l\T^n) = Si^nhi)Sj,nhj) Cov{zi^n+l, Zj,n+l\T^n) 

for the model (5.2) and (5.3). Note that (5.3) determines s?„ recursively 
from TZn, and that z„_|_i is independent of TZn and, therefore, its covariance 
matrix can be consistently estimated from the residuals z^. Under (5.2) and 
(5.3), the NPEB approach uses the following formulas for /x„ and V„ in (3.5): 

rp rp ^ ^ 

(5.5) l-ln = {^i'^l,n, ■ ■ ■ J Pm^rrijii) j ^ n — l~^nl^n ~^ ^^i,n^j,n^ij')l<i,j<ni 

in which /3j is the least squares estimate of /3j, and and aij are the usual 
estimates of si^n and Co'v{zi^i,Zj^i) based on TZn- Further discussion of time 
series modeling for implementing the optimal portfolio in Section 3 will be 
given in Sections 6.2 and 7. 

6. An empirical study. In this section we describe an empirical study of 
the out-of-sample performance of the proposed approach and other meth- 
ods for mean-variance portfolio optimization when the means and covari- 
ances of the underlying asset returns are unknown. The study uses monthly 
stock market data from January 1985 to December 2009, which are obtained 
from the Center for Research in Security Prices (CRSP) database, and 
evaluates out-of-sample performance of different portfolios of these stocks 
for each month after the first ten years (120 months) of this period to 
accumulate training data. The CRSP database can be accessed through 
the Wharton Research Data Services at the University of Pennsylvania 
(http://wrds.wharton.upenn.edu). Following Ledoit and Wolf (2004), at 
the beginning of month t, with t varying from January 1995 to December 
2009, we select m = 50 stocks with the largest market values among those 
that have no missing monthly prices in the previous 120 months, which are 
used as the training sample. The portfolios for month t to be considered are 
formed from these m stocks. 

Note that this period contains highly volatile times in the stock market, 
such as around "Black Monday" in 1987, the Internet bubble burst and the 
September 11 terrorist attacks in 2001, and the "Great Recession" that be- 
gan in 2007 with the default and other difficulties of subprime mortgage 
loans. We use sliding windows of n = 120 months of training data to con- 
struct portfolios of the stocks for the subsequent month. In contrast to the 
Black-Litterman approach described in Section 2.2, the portfolio construc- 
tion is based solely on these data and uses no other information about the 
stocks and their associated firms, since the purpose of the empirical study 
is to illustrate the basic statistical aspects of the proposed method and 
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to compare it with other statistical methods for implementing Markowitz's 
mean-variance portfolio optimization theory. Moreover, for a fair compari- 
son, we do not assume any prior distribution as in the Bayes approach, and 
only use NPEB in this study. 

Performance of a portfolio is measured by the excess returns et over 
a benchmark portfolio. As t varies over the monthly test periods from Jan- 
uary 1995 to December 2009, we can (i) add up the realized excess returns to 
give the cumulative realized excess return Yl\=i to time t, and (ii) use 
the average realized excess return and the standard deviation to evaluate 
the realized information ratio ^/Vle/se, where e is the sample average of the 
monthly excess returns and Se is the corresponding sample standard devia- 
tion, using 1/12 to annualize the ratio as in Ledoit and Wolf (2004). Noting 
that the realized information ratio is a summary statistic of the monthly 
excess returns in the 180 test periods, we find it more informative to supple- 
ment this commonly used measure of investment performance with the time 
series plot of cumulative realized excess returns, from which the realized 
excess returns et can be retrieved by differencing. 

We use two ways to construct the benchmark portfolio. The first follows 
that of Ledoit and Wolf (2004), who propose to mimic how an active portfolio 
manager chooses the benchmark to define excess returns. It is described in 
Section 6.1. The second simply uses the S&P500 Index as the benchmark 
portfolio and Section 6.3 considers this case. Section 6.2 compares the time 
series of the returns of these two benchmark portfolios and explains why we 
choose to use the S&:P500 Index as the benchmark portfolio in conjunction 
with the time series model (5.2) and (5.3) for the excess returns in Sec- 
tion 6.3. 

6.1. Active portfolios and associated optimization problems. In this sec- 
tion the benchmark portfolio consists of the m = 50 stocks chosen at the 
beginning of each test period and weights them by their market values. 
Let denote the weight of this value- weighted benchmark and w the 
weight of a given portfolio. The difference w = w — satisfies w"^l = 0. 
An active portfolio manager would choose w that solves the following opti- 
mization problem instead of (1.1): 

Wactivc = + argrninw Sw subject to ^ fj, = fx^,, 

(6.1) ~ ^ 
w 1 = and w € C, 

in which C represents additional constraints for the manager, 5] is the co- 
variance matrix of stock returns and /I* is the target excess return over 
the value-weighted benchmark. The portfolio defined by Wactivc is called 
an active portfolio. Since /i and 5] are typically unknown, putting a prior 
distribution on them in (6.1) leads to the following modification of (3.1): 

(6.2) max{i?(w"^rri+i) — A Var(w^r„-|_i)} subject to w"^l = 0. 
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Table 3 

Means and standard deviations (in parentheses) of the annualized realized excess returns 

over the value-based benchmark 





0.01 


0.015 




0.02 




0.03 




A 


2^ 


2 








2-^ 




(a) All test periods by re-defining 


; portfolios in 


some 


periods 








Plug-in 


0.001 (4.7e-3) 


0.002 (7.3e- 


-3) 


0.003 (9.6e- 


-3) 


0.007 (1.4e- 


-2) 


Shrink 


0.003 (4.3e-3) 


0.004 (6.6e- 


-3) 


0.006 (8.8e- 


-3) 


0.011 (1.3e- 


-2) 


Boot 


0.001 (2.5e--3) 


0.001 (3.8e- 


'3) 


0.001 (5.1e- 


-3) 


0.003 (7.3e- 


-3) 


NPEB 


0.029 (1.2e-l) 


0.046 (1.3e- 


-1) 


0.053 (1.5e- 


-1) 


0.056 (1.6e- 


-1) 


(b) Test periods in which all portfolios are well defined 








Plug-in 


0.002 (6.6e-3) 


0.004 (l.Oe- 


-2) 


0.006 (1.4e- 


-2) 


0.014 (1.9e- 


-2) 


Shrink 


0.005 (5.9e-3) 


0.008 (9.0e- 


-3) 


0.012 (1.2e- 


-2) 


0.021 (1.8e- 


-2) 


Boot 


0.001 (3.5e-3) 


0.003 (5.3e- 


-3) 


0.003 (7.1e- 


-3) 


0.006 (l.Oe- 


-2) 


NPEB 


0.282 (9.3e-2) 


0.367 (Lie- 


-1) 


0.438 (l.le- 


-1) 


0.460 (l.le- 


-2) 



This optimization problem can be solved by the same method as that intro- 
duced in Section 3. 

Following Ledoit and Wolf (2004), we choose the constraint set C such 
that the portfolio is long only and the total position in any stock cannot 
exceed an upper bound c, that is, C = {w : — < w < cl — w^}, with c = 
0.1. We use quadratic programming to solve the optimization problem (6.1) 
in which and S are replaced, for the plug-in active portfolio, by their 
sample estimates based on the training sample in the past 120 months. The 
covariance-shrinkage active portfolio uses a shrinkage estimator of 5] instead, 
shrinking toward a patterned matrix that assumes all pairwise correlations 
to be equal [Ledoit and Wolf (2003)]. Similarly, we can extend Section 2.3 
to obtain a resampled active portfolio, and also extend the NPEB approach 
in Section 4 to construct the corresponding NPEB active portfolio. Table 3 
summarizes the realized information ratio \/T2e/se for different values of 
annualized target excess returns /I* and "matching" values of A whose choice 
is described below. 

We first note that specified target returns may be vacuous for the 
plug-in, covariance-shrinkage (abbreviated "shrink" in Table 3) and resam- 
pled (abbreviated "boot" for bootstrapping) active portfolios in a given test 
period. For Ji^ = 0.01,0.015,0.02,0.03, there are 92, 91, 91 and 80 test peri- 
ods, respectively, for which (6.1) has solutions when X) is replaced by either 
the sample covariance matrix or the Ledoit- Wolf shrinkage estimator of the 
training data from the previous 120 months. Higher levels of target returns 
result in even fewer of the 180 test periods for which (6.1) has solutions. On 
the other hand, values of /I* that are lower than 1% may be of little practical 
interest to active portfolio managers. When (6.1) does not have a solution 
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to provide a portfolio of a specified type for a test period, we use the value- 
weighted benchmark as the portfolio for the test period. Table 3(a) gives 
the actual (annualized) mean realized excess returns 12e to show the extent 
to which they match the target value /i*, and also the corresponding annu- 
alized standard deviations \/T2se, over the 180 test periods for the plug-in, 
covariance-shrinkage and resampled active portfolios constructed with the 
above modification. These numbers are very small, showing that the three 
portfolios differ little from the benchmark portfolio, so the realized informa- 
tion ratios that range from 0.24 to 0.83 for these active portfolios can be quite 
misleading if the actual mean excess returns are not taken into consideration. 

We have also tried another default option that uses 10 stocks with the 
largest mean returns (among the 50 selected stocks) over the training period 
and puts equal weights to these 10 stocks to form a portfolio for the ensuing 
test period for which (6.1) does not have a solution. The mean realized 
excess returns 12e when this default option is used are all negative (between 
— 17.4% and —16.3%), while Jl^ ranges from 1% from 3%. Table 3(a) also 
gives the means and standard deviations of the annualized realized excess 
returns of the NPEB active portfolio for four values of A that are chosen so 
that the mean realized excess returns roughly match the values of /I* over 
a grid of the form A = 2^ (-2 <j<2) that we have tried. Note that NPEB 
has considerably larger mean excess returns than the other three portfolios. 

Table 3(b) restricts only to the 80-92 test periods in which the plug- 
in, covariance-shrinkage and resampled active portfolios are all well defined 
by (6.1) for /I* = 0.01,0.015,0.02 and 0.03. The mean excess returns of the 
plug-in, covariance-shrinkage and resampled portfolios are still very small, 
while those of NPEB are much larger. The realized information ratios of 
NPEB range from 3.015 to 3.954, while those of the other three portfolios 
range from 0.335 to 1.214 when we restrict to these test periods. 

6.2. Value-weighted portfolio versus S&P500 Index and time series effects. 
The results for the plug-in and covariance-shrinkage portfolios in Table 3 are 
markedly different from those of Ledoit and Wolf (2004) covering a different 
period (February 1983-December 2002). This suggests that the stock returns 
cannot be approximated by the assumed i.i.d. model underlying these meth- 
ods. In Section 5.3 we have extended the NPEB approach to a very flexible 
time series model (5.2) and (5.3) of the stock returns ra. The stochastic 
regression model (5.2) can incorporate important time-varying predictors 
in Xjj for the zth stock's performance at time t, while the GARCH mo- 
del (5.3) for the random disturbances en in (5.2) can incorporate dynamic 
features of the stock's idosyncratic variability. It seems that a regressor such 
as the return ut of S&:P500 Index should be included in Xj^ to take advan- 
tage of the co-movements of rn and ut- However, since ut is not observed 
at time t, one may need to have good predictors of ut which should consist 
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Fig. 2. Time series plots of the monthly returns (Top) of the S&P500 Index and the 
value-weighted benchmark, and cumulative excess returns (Bottom) of the value-weighted 
benchmark ( abbreviated by Bench^ai ) and for active portfolios ( abbreviated by the subscript 
"act") over the S&P500 Index. 

not only of the past S&P500 returns but also macroeconomic variables. Of 
course, stock-specific information such as the firm's earnings performance 
and forecast and its sector's economic outlook should also be considered. 
This means that fundamental analysis, as carried out by professional stock 
analysts and economists in investment banks, should be incorporated into 
the model (5.2). Since this is clearly beyond the scope of the present em- 
pirical study whose purpose is to illustrate our new statistical approach 
to the Markowitz optimization enigma, we shall focus on simple models to 
demonstrate the benefit of building good models for rj+i in our stochastic 
optimization approach. 

In this connection, we first compare the S&PSOO Index with the value- 
weighted portfolio, which is the benchmark portfolio in Section 6.1. The 
top panel of Figure 2 gives the time series plots of the monthly returns 
(which are not annualized) of both portfolios during the test period. The 
S&:P500 Index has mean 0.006 and standard deviation 0.046 in this period, 
while the mean of the value-weighted portfolio is 0.0137 and its standard 
deviation is 0.045. The bottom panel of Figure 2 plots the time series of 
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Fig. 3. Comparison of returns and excess returns. Top panel: returns of S&P500 Index 
(black) and SHW (red). Middle panel: excess returns (blue) of SHW. Bottom panel: Auto- 
correlations of returns (red) and excess returns (blue) of SHW; the dotted lines represent 
rejection boundaries of 5%-level tests of zero autocorrelation at indicated lag. 

cumulative realized excess returns Yl\=i ^i S&P500 Index, for the 

value- weighted portfolio and also for the four active portfolios in Table 3(a) 
under the column = 0.015 and A = 2, during the test period (January 
1995-December 2009). Unlike NPEBact) the cumulative realized excess re- 
turns of the other three active portfolios differ little from the value- weighted 
portfolio, as shown by the figure. 

In view of the structural changes in the economy and the financial markets 
during this period, it appears difficult to find simple time series models that 
can reflect the inherent nonstationarity. If we use the S&PSOO Index ut as an 
alternative benchmark to the value- weighted portfolio used in Section 6.1, 
the excess returns en = ru — ut may be able to exploit the co-movements 
of Tjt and Ut to remove their common nonstationarity due to changes in 
macroeconomic variables. As an illustration, the top panel of Figure 3 gives 
the time series plots of returns of Sherwin-Williams Co. (SHW) and of the 
S&P500 Index during this period, and the middle panel gives the time series 
plot of the excess returns. The Ljung-Box test, which involves autocorre- 
lations of lags up to 20 months, has p-v&lue 0.001 for the monthly returns 
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of SHW and 0.267 for the excess returns, and therefore rejects the i.i.d. as- 
sumption for the actual but not the excess returns; see Section 5.1 of Lai and 
Xing (2008). This is also shown graphically by the autocorrelation functions 
in the bottom panel of Figure 3. 

6.3. Using the S&P500 Index as benchmark portfolio and time series mod- 
els of excess returns. The preceding section shows that using the S&;P500 
Index as the benchmark portfolio has certain advantages over the value- 
weighted portfolio. In this section we consider the excess returns eu over the 
S&:P500 Index ut, which we use as the benchmark portfolio, and fit rela- 
tively simple time series models to the training sample to predict the mean 
and volatility of eu for the test period. Instead of forming active portfolios 
as in Section 6.1, we follow traditional portfolio theory as described in Sec- 
tions 1-3. Note that this theory assumes the constraint '^^iWi = 1 and, 
therefore, 



whereas active portfolio optimization considers weights Wi = Wi — Wi^B that 
satisfy the constraint ^^iWi = 0. In view of (6.3), when the objective is 
to maximize the mean return of the portfolio subject to a constraint on 
the volatility of the excess return over the benchmark (which is related to 
achieving an optimal information ratio), we can replace the returns ru by 
the excess returns eu in the portfolio optimization problem (1.1) or (3.1). 
As explained in the second paragraph of Section 6.2, eu can be modeled by 
simpler stationary time series models than ru- 

The simplest time series model to try is the AR(1) model eu = Oj + 
'jiei^t-i + ^it- Assuming this time series model for the excess returns, we 
can apply the NPEB procedure in Section 5.3 to the training sample and 
thereby obtain the NPEBar portfolio for the test sample. The AR(1) model 
uses Xj^t_i = (1, Ci^t-i)'^ as the predictor in a linear regression model for e^^t. 
To improve prediction performance, one can include additional predictor 
variables, for example, the return ut-i of the S&P500 Index in the preceding 
period. Assuming the stochastic regression model ei^t = (1, Cj^t-i, ^^t-i)/3j + 
ej^t, and the GARCH(1, 1) model (5.3) for ej^t, we can apply the NPEB 
procedure to the training sample and thereby form the NPEBsrg portfolio 
for the test sample. 

Instead of taking long-only positions (i.e., Wi > for all i), we also al- 
low short selling, with the constraint Wi > —0.05 for all i, to construct the 
following portfolios in this section. For the plug-in, covariance-shrinkage 
and resampled portfolios, which we abbreviate as in Figure 2 but with- 
out the subscript "act" (for active), we use the annualized target return 
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Fig. 4. Realized cumulative excess returns over the S&P500 Index. 

/i* = 0.015,0.02,0.03, for which the problem (1.1) can be solved for all 180 
test periods under the weight constraint; note that we use the mean return 
instead of the mean excess return as the target /x*. For the NPEBar and 
NPEBsRG portfolios, we use the training sample as in Section 5.1 to choose A 
by maximizing the information ratio over the grid A € {2* : i = —3, —2, . . . , 6}. 
Figure 4 plots the time series of cumulative realized excess returns over the 
S&:P500 Index during the test period of 180 months, for Plug-in, Shrink and 
Boot with fi^ = 0.015 and for NPEBar and NPEBsrg- Table 4 gives the 
annualized realized information ratios, with the S&P500 Index as the bench- 
mark portfolio. The table also considers cases ^u* = 0.02,0.03, and further 
abbreviates Plug-in, Shrink and Boot by P, S, B, respectively. 

6.4. Discussion. Our approach may perform much better if the investor 
can combine domain knowledge with the statistical modeling that we illus- 
trate here. We have not done this in the present comparative study because 
using a purely empirical analysis of the past returns of these stocks to build 
the prediction model (5.2) would be a disservice to the power and versatil- 
ity of the proposed approach, which is developed in Section 3 in a general 
Bayesian framework, allowing the skillful investor to make use of prior beliefs 
on the future return vector Vn+i and statistical models for predicting r„_|_i 
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Table 4 

Realized information ratios and average realized excess returns (in square brackets) with 

respect to the S&P500 Index 





= 0.015 


/x« = 0.02 


/I* = 0.03 


NPEB 


p 


S B 


P S B 


P S B 


AR SRG 


0.527 
[0.078] 


0.352 0.618 
[0.052] [0.077] 


0.532 0.353 0.629 
[0.078] [0.051] [0.078] 


0.538 0.354 0.625 
[0.076] [0.050] [0.077] 


0.370 1.169 
[0.283] [0.915] 



from past market data. The prior beliefs can involve both the investor's and 
the market's "views," as in the Black-Litterman approach described in Sec- 
tion 2.2, for which the market's view is implied by the equilibrium portfolio. 
Note that Black and Litterman model the potential errors of these views 
by normal priors whose covariance matrices reflect the uncertainties. Our 
Bayesian approach goes one step further to account for these uncertainties 
by using the actual means and variances of the portfolio's return in the op- 
timization problem (3.1), instead of the estimated means and variances in 
the plug-in approach. 

A portfolio on Ivlarkowitz's efficient frontier can be interpreted as a mini- 
mum-variance portfolio achieving a target mean return, or a maximum-mean 
portfolio at a given volatility (i.e., standard derivation of returns). Portfolio 
managers prefer the former interpretation, as target returns are appealing to 
investors. In active portfolio management [Grinold and Kahn (2000)], this 
has led to the target excess return /I^, and the optimization problem (6.1). 
The empirical study in Section 6.1 shows that when the means and covari- 
ances of the stock returns are unknown and are estimated from historical 
data, putting these estimates in (6.1) may not provide a solution; moreover, 
the actual mean of the solution (when it exists) can differ substantially 
from Jit,. 

7. Concluding remarks. The "Markowitz enigma" has been attributed 
to (a) sampling variability of the plug-in weights (hence use of resampling to 
correct for bias due to nonlinearity of the weights as a function of the mean 
vector and covariance matrix of the stocks) or (b) inherent difficulties of 
estimation of high-dimensional covariance matrices in the plug-in approach. 
Like the plug-in approach, subsequent refinements that attempt to address 
(a) or (b) still follow closely Markowitz's solution for efficient portfolios, 
constraining the unknown mean to equal to some target returns. This tends 
to result in relatively low information ratios when no or limited short sell- 
ing is allowed, as noted in Sections 4.3 and 6. Another difficulty with the 
plug-in and shrinkage approaches is that their measure of "risk" does not ac- 
count for the uncertainties in the parameter estimates. Incorporating these 
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uncertainties via a Bayesian approach results in a much harder stochastic 
optimization problem than Markowitz's deterministic optimization problem, 
which we have been able to solve by introducing an additional parameter rj. 

Our solution of this stochastic optimization problem opens up new possi- 
bilities in extending Markowitz's mean-variance portfolio optimization the- 
ory to the case where the means and covariances of the asset returns for 
the next investment period are unknown. As pointed out in Section 5.3, 
our solution only requires the posterior mean and second moment matrix 
of the return vector for the next period, and one can combine the Black- 
Litterman-type expert views with statistical modeling to develop Bayesian 
or empirical Bayes models with good predictive properties, for example, by 
using (5.2) with suitably chosen Xi^t-i- 
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SUPPLEMENTARY MATERIAL 

Supplement: Matlab implementation of the NPEB method 

(DOT: 10.1214/10-AOAS422SUPP; .zip). The source code of our approach 
is provided. 
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